Limitations of Learning Via Embeddings

نویسنده

Shai Ben-David

چکیده

This paper considers the embeddability of general concept classes in Euclidean half spaces. By embedding in half spaces we refer to a mapping from some concept class to half spaces so that the labeling given to points in the instance space is retained. The existence of an embedding for some class may be used to learn it using an algorithm for the class it is embedded into. The Support Vector Machines paradigm employs this idea for the construction of a general learning system. We show that an overwhelming majority of the family of nite concept classes of constant VC dimension d cannot be embedded in low-dimensional half spaces. (In fact, we show that the Euclidean dimension must be almost as high as the size of the instance space.) We strengthen this result even further by showing that an overwhelming majority of the family of nite concept classes of constant VC dimension d cannot be embedded in half spaces (of arbitrarily high Euclidean dimension) with a large margin. (In fact, the margin cannot be substantially larger than the margin achieved by the trivial embedding.) Furtehrmore, this bounds are robust in the sense that allowing each image half space to err on a small fraction of the instances does not imply a signiicant weakening of these dimension and margin bounds. 0 Our results indicate that any universal learning machine, which transforms data into the Euclidean space and then applies linear (or large margin) classiication, cannot enjoy any meaningful generalization guarantees that are based on either VC dimension or margins considerations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering

In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning...

متن کامل

Online Learning of Interpretable Word Embeddings

Word embeddings encode semantic meanings of words into low-dimension word vectors. In most word embeddings, one cannot interpret the meanings of specific dimensions of those word vectors. Nonnegative matrix factorization (NMF) has been proposed to learn interpretable word embeddings via non-negative constraints. However, NMF methods suffer from scale and memory issue because they have to mainta...

متن کامل

Learning Lexical Embeddings with Syntactic and Lexicographic Knowledge

We propose two improvements on lexical association used in embedding learning: factorizing individual dependency relations and using lexicographic knowledge from monolingual dictionaries. Both proposals provide low-entropy lexical cooccurrence information, and are empirically shown to improve embedding learning by performing notably better than several popular embedding models in similarity tas...

متن کامل

Syntactico Semantic Word Representations in Multiple Languages

Our project is an extension of the project “Syntactico Semantic Word Representations in Multiple Languages”[1]. The previous project aims to improve the semantical representation of English vocabulary via incorporating the local context with global context and supplying homonymy and polysemy for multiple embeddings per word. It also introduces a new neural network architecture that learns the w...

متن کامل

Sentiment Analysis by Joint Learning of Word Embeddings and Classifier

Word embeddings are representations of individual words of a text document in a vector space and they are often useful for performing natural language processing tasks. Current state of the art algorithms for learning word embeddings learn vector representations from large corpora of text documents in an unsupervised fashion. This paper introduces SWESA (Supervised Word Embeddings for Sentiment...

متن کامل

Deep Multilingual Correlation for Improved Word Embeddings

Word embeddings have been found useful for many NLP tasks, including part-of-speech tagging, named entity recognition, and parsing. Adding multilingual context when learning embeddings can improve their quality, for example via canonical correlation analysis (CCA) on embeddings from two languages. In this paper, we extend this idea to learn deep non-linear transformations of word embeddings of ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Limitations of Learning Via Embeddings

نویسنده

چکیده

منابع مشابه

Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering

Online Learning of Interpretable Word Embeddings

Learning Lexical Embeddings with Syntactic and Lexicographic Knowledge

Syntactico Semantic Word Representations in Multiple Languages

Sentiment Analysis by Joint Learning of Word Embeddings and Classifier

Deep Multilingual Correlation for Improved Word Embeddings

عنوان ژورنال:

اشتراک گذاری